Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads
نویسندگان
چکیده
High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments ("reads") from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted. We develop a detection technique that can discriminate between "known" and "unknown" taxa, which can be used with composition-based methods, as well as a hybrid method. Unlike previous studies, we rigorously evaluate all algorithms for their ability to detect novel taxa. First, we show that the integration of a detector with a composition-based method performs significantly better than homology-based methods for the detection of novel species and genera, with best performance at finer taxonomic resolutions. Most importantly, we evaluate all the algorithms by introducing an "unknown" class and show that the modified version of PhymmBL has similar or better overall classification performance than the other modified algorithms, especially for the species-level and ultrashort reads. Finally, we evaluate the performance of several algorithms on a real acid mine drainage dataset.
منابع مشابه
A short glance on leaf anatomy and taxonomy of subfamily Caryophylloideae in Iran
Caryophyllaceae is the fifth largest Iranian plant family in terms of number of species where most of its species are Irano-Turanian. About 30% of these species are endemic of Iran. There are continuously uncertainties regarding the circumscription of genera, and especially because of new molecular and phylogenetic studies that in many cases do not confirm the traditional generic circumscriptio...
متن کاملL-Asparaginase-producing Rouxiella Species Isolation, Antileukemia Activity Evaluation, and Enzyme Production Optimization
Background: L-Asparaginase (L-Asp) is used as an efficient anti-cancer drug, especially for acute lymphoblastic leukemia (ALL). Currently, two bacterial asparaginase isoenzymes are used for cancer treatment. Therefore, this research focused on isolating native bacteria with the ability to produce L-Asp. Materials and methods: L-Asp producing bacteria were isolated from soil samples on 9K medi...
متن کاملA report of two novel cyanobacterial species from Aphanocapsa for flora of Iran
In summer 2013, phytobenthos and periphyton cyanobacterial samples were collected from Sirch hot springs which are located in 50 Km of southeast of Kerman city (Iran) with coordinates of 30° 09´ 44˝ N and 57° 35´ 50.0˝ E (Fig. 1A). Aphanocapsa rivularis, A. delicatissima are new species for Iran flora. The species of Aphanocapsa genera which have been reported previously from Iran are: Aphanoca...
متن کاملA Novel Method for Detection of Epilepsy in Short and Noisy EEG Signals Using Ordinal Pattern Analysis
Introduction: In this paper, a novel complexity measure is proposed to detect dynamical changes in nonlinear systems using ordinal pattern analysis of time series data taken from the system. Epilepsy is considered as a dynamical change in nonlinear and complex brain system. The ability of the proposed measure for characterizing the normal and epileptic EEG signals when the signal is short or is...
متن کاملFormation interface detection using Gamma Ray log: A novel approach
There are two methods for identifying formation interface in oil wells: core analysis, which is a precise approach but costly and time consuming, and well logs analysis, which petrophysists perform, which is subjective and not completely reliable. In this paper, a novel coupled method was proposed to detect the formation interfaces using GR logs. Second approximation level (a2) of GR log gained...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
دوره 2011 شماره
صفحات -
تاریخ انتشار 2011